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Abstract 

In this article, the distribution of the number of node neighbors of a given node in homogeneous 
random hypergraphs of uniform rank r is calculated exactly for the full range of hypergraph 
densities. This is equivalent to the degree distribution of the projections of hypergraphs onto graphs 
(networks). The calculation is non-trivial due to the possibility that nodes belong simultaneously 
to multiple hyperedges. The node distribution contains a new combinatorial coefficient Qr-i{k, i), 
which counts the number of distinct labelled hypergraphs of k nodes, i hyperedges of rank r — 1, 
and where every node is connected to at least one hyperedge. Two approaches are used to calculate 
Qr-i{k,£): by the inclusion-exclusion principle of combinatorics, and by introducing an assembly 
process that leads to all possible hypergraphs satisfying the node connectivity condition. Some 
identities of Qr-i are derived and applied to the verification of normalization and the calculation 
of moments of the node distribution. For sparse hypergraphs, the distribution is asymptotically 
similar to the poisson distribution, but exhibits strong fluctuations that decay as a power-law of 
the system size N, where the decay exponent is equal to the number of overlapping nodes for a 
given degree. Therefore, any approximation of the distribution that ignores overlaps incurs an error 
of order N~^. Also, close to percolation, this bound implies that sparse hypergraphs and their 
projections into graphs are qualitatively similar to within this error. It is shown that the dense 
limit cannot be explained if overlaps are ignored, and the correct asymptotic form is provided. 

PACS numbers: 89.75.Hc, 02.10.Ox, 89.65.-s, 05.90.+m 
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I. INTRODUCTION 



Fuelled by the recent availability of digitized data from many sources, including social, 
technological, and natural systems, the scientific community has placed renewed interest into 
quantitative analysis of large datasets. In this context, complex networks theorji has emerged 
as one of the most active research areas providing new analytical techniques In essence, 
complex networks focuses on developing understanding of a system from its representation 
as a collection of objects called nodes and the relations between them, called edges. The set 
of nodes and edges together are known as a graph (in mathematics) or network (in complex 
networks theory and in physics). Some examples of network representations are people and 
their friendships, particles and their collisions, or statistical variables and their correlations. 

The techniques of complex networks are meant to be quite general. Some well studied 
examples of graphs are social networks [2], power grids and networks of infectious disease 
propagation although there are many more systems that are being tackled with these 
techniques. The general approach of complex networks is to study the statistical properties of 
a graph cr or set of graphs {crj^onf such as degree distribution (where degree is the number 
of edges connected to a node, equivalent to the number of node neighbors in a graph), 
distribution of shortest path lengths among nodes (which is at the core of the small world 
notion and of six degrees of separation js], 0]), and community structure (loosely defined as 
groups of nodes among which there are more edges than with the rest of the graph) j3, Q]. 
Of all these properties, the degree distribution is perhaps the most widely used in ongoing 
research. 

In some systems, interactions occur in groups of nodes that may be larger than two. 
There a numerous examples of this, such as the social networks in which infectious disease 
propagate, or the statistical interactions between correlated events in financial systems. 
Regardless of the context, when such multiway interactions occur it is convenient to use 
hypergraphs, which generalize graphs by substituting edges with hyperedges, conglomerates 
of nodes that interact together in groups of sizes two or larger. In this case, equivalent 
notions exist to those of graphs, such as path length and degree jol. This approach is 



gradually gaining attention |10l-|l3|. 



One particular metric, the degree distribution, slightly changes meaning in hypergraphs. 
While degree continues to be the number of hyperedges a node is connected to, this is no 
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longer equivalent to the number of node neighbors a given node has. This later quantity 
(henceforth referred to as neighbor distribution), has received little direct attention despite 
its intuitive relevance. In this article, I focus on this quantity in the case of homogeneous 
random hypergraphs of uniform rank r, and derive complete results that cover all hypergraph 
densities. This is done via hypergraph projections onto graphs as explained next 14 1. 

To determine the neighbor distribution in hypergraphs, it is equivalent to look at projec- 
tions of hypergraphs onto graphs and calculate the usual degree distribution in the projected 
graphs [l5|. The projections are define so that if two nodes are connected by any hyperedge 
then the projected graph has an edge between those nodes. This relation is important be- 
cause it is customary to first attempt to use graphs whenever possible, typically weighted 



15|, 



16| . It is worthwhile to point 



graphs representing the projection of the hypergraph 
out that another equivalence can be established between hypergraphs and bipartite net- 
works as explained in Chap. 7 of Ref. {l6|. For bipartite networks, the graph projection 
corresponds to so-called one-mode networks, where once again, the degree distribution is 
the quantity of interest. In this context some relevant work has been done that is related 
to the topic of this article |l7Hl9j. but it is limited to the sparse hmit, and therefore still 
leaves unanswered questions. 

The complication in calculating the neighbor distribution is that it exhibits a kind of de- 
generacy due to the potential presence of some nodes in multiple hyperedges (node overlaps). 
This makes the distribution calculation non-trivial. In tracking this degeneracy, the need 
for a new combinatorial quantity emerges. This enumerative quantity is Qr^i{k,£) which, 
as explained below, is the cardinality of the set of i hyperedges of rank r that visit exactly 
k distinct nodes. Qr-i{k,£) also corresponds to the number of distinct labelled hypergraphs 
with k nodes and i hyperedges of rank r — 1 such that all nodes belong to at least one 
hyperedge. As far as the author is aware, this is the first study of Qr-i{k,i); some partial 



results exist for the case of r — 1 = 2 in Refs. [20|, 121 1 which offer approximations. In this 
article, Qr-i{k, i) is calculated by two different methods, and a number of identities relevant 
to the neighbor distribution are derived for it. The calculation of Qr-i{k,i) allows for an 
exact solution to the neighbor distribution, as well as the derivation of its sparse and dense 
asymptotics. In the conclusions, I briefly describe how to tackle the full problem where rank 
r is no longer uniform. 

A number of excellent recent publications jlO . 
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13| touch on a related form of the 
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neighbor distribution problem posed here, by counting neighbors multiple times if they are 
part of different hyperedges. However, in those publications, the focus resides in the sparse 
limit, where overlaps are small (see results in Sec. [IT]), and therefore the error made is 
asymptotically small, decaying in inverse proportion to the system size. 

The structure of the paper is as follows: Sec. [TTl focuses on constructing the basics of hy- 
pergraph projections onto graphs, and showing the expressions for the neighbor distribution 
of the projected graphs in general and in the dense and sparse limits. Section UTTl deals with 
the calculation of Qr-i{k, €} by two methods: inclusion-exclusion principle of combinatorics, 
and graph assembly. The later method is developed for r = 3 and additional results are de- 
veloped to apply it to Qr-i{k, i), i.e., general r. In order to apply Qr-i{k, i) to the neighbor 
distribution, a number of combinatorial identities are derived and presented in Sec. IIVI The 
conclusions are presented in Sec. |Vl 



II. HYPERGRAPH TO GRAPH PROJECTIONS AND THE CALCULATION OF 
THE NEIGHBOR DISTRIBUTION 

Consider a hypergraph cr consisting of a set of nodes 1, . . . , A^, and a set of hyperedges of 
rank r. Each hyperedge has r nodes ii, . . . ,ir, and is assigned an indicator crii,...^^ equal to 
1 if it is present in cr, and if it is absent. For simplicity, I focus on undirected hypergraphs 
(indicators crjj^...^j^ are symmetric under permutations of The hypergraphs are 

also homogeneous and non-interacting, where all hyperedges have equal probability p to 
occur. Using the homogeneity and absence of interaction, the probability -P(cr) to observe 
configuration cr is given by 

P(cr) =/('^)(l-p)(^)-^('^) (1) 

where L{cr) is the number of hyperedges in cr. By defining T(A^, r) as the set of all possible 
hyperedges, the result above can also be written as 

P(cr) = Yl _p)i-'^n..-,'r (2) 

(ii,...,ir)£T{N,r) 

where cTj^ are the hyperedges of cr. 

The general hypergraph projection onto a graph is defined as a function V applied 
over the hyperedges of cr that produces the adjacency matrix Wij for the projected weighted 
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graph G{(t). Each Wij is the indicator for edge ij in G, but Wij can be any real positive 
number including zero, making G a weighted graph. G{(t) is formed by the same node 
set as cr, together with edges that satisfy Wij > 0. Note that if a node does not belong 
to any hyperedge, it is isolated in both cr and G. For given cr, one can define the subset 
Oij{(T) := {{ii, . . . , ir)\{ii, . . . ,ir) E cr Ai e {ii, . . . , v} A j G {ii, . . . , v}} of its hyperedges 
that include simultaneously nodes i and j. It is natural to study projections of the type 

w,,iG) = Vio,,ia)), (3) 

where Oij = |Ojj(cr)| is the size (cardinality) of Oij{cr). Thus, the weight of link ij in G only 
depends on the number of hyperedges that contain i and j (an intuitive choice, although 
certainly not the only possible model). Furthermore, it is sensible to introduce the additional 
assumption that Wij > iff Oij > 0, or in other words, any pair of nodes ij in the graph 
has non-zero weight if its corresponding Oij is not empty. An illustration of the projection 
process for the case Wij = V{oij) = Oij and r = 3 is shown in Fig. [H 

For projections as those defined above, the number of neighbors of node i in G(cr) is 
given by 

N ^ / \ 

j=^;j¥=i j=^;j¥=i \(ii,...,ir)eOtj{cr) j 

where Q{x) is the Heaviside step function, equal to zero if x < 0, and 1 if a; > 0. To determine 
the neighbor distribution ipi{ki,p), one uses 

\j=i;m J 

1 

]^ p'^^~'^l,--,r... ^ ^ p'^N-r + l,...,N 1 -(^AT-r+l , . . . , JV 

''"JV-r+l,...,JV=0 




N \ 

J2 Oio.,{a)),h]. (5) 



S 

where {crjconf represents the set of all configurations contained in the homogeneous non- 
interacting hypergraph ensemble, and 6 corresponds to a Kronecker delta. Equation ([2]) 
allows factorizing the sum over configurations in Eq. ([5]) to produce the second line of the 
equation. Only configurations a for which J2f=i-j^i^i^iji^)) equal to ki contribute to 
ipiiki^p). Only hyperedges where one of the indices ii, . . . , v is equal to i are relevant to 
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ki] all other hyperedges contribute the factor ^ ^^p'^^i' {1 — ■■■.»r = i. Let us 

label Ti{N,r) the set of hyperedges that contribute to over all possible configurations. 
As explained in the following, completing the calculation of tpi{ki,p) requires determining 
the terms in Eq. ([5]) that lead the delta to be 1, which is equivalent to finding all sets of 
hyperedges (zi, . . . ,ir) G Tj(A^, r) where o"ji_..._j^ = 1, and the nodes involved in the set visit 
exactly ki nodes as well as i. 

The presence of the 6 function in the definition of ki is the source of degeneracy in the 
calculation of node neighbors. Thus, several configurations cr can lead to the same number 
of neighbors in a projected network. Figure |2] illustrates the different possible situations. 
From the figure, note the ways in which ki = 3 can emerge from various hypergraphs. All 
the possible hyperedge configurations that lead to /cj = 3 involving nodes {a,b,c,i} (Fig. [2] 
left and top right panels) are as follows: i) {a,b,i) and {b,c,i), ii) {a,b,i) and (a,c, z), iii) 
{a,c,i) and {b,c,i), or iv) {a,b,i), {a,c,i), and {b,c,i). Three of the possibilities have two 
hyperedges (denoted by £j = 2), and the last possibility has three hyperedges {ii = 3). If 
the hyperedges would involve another set of nodes, say {a,b,d,i}, a similar situation would 
occur. Note that £i = 2 can also generate fcj = 4 (e.g., bottom right of Fig. 121 where two 
hyperedges involve the nodes {a,b,c,d,i}). All the cases just described play a role in the 
calculation of ipi{ki,p). 

The examples above provide a way to proceed with the calculation. First, one can 
concentrate on a specific set of ki nodes that connect to i, say p(fci), which guarantees that 
the degree is ki (the choice of p{ki) must be feasible, i.e., ki cannot be equal to 1, . . . , r — 2). 
Consider p{ki) = {a,b,c} and define Qr-iih, ii), the number of ways to achieve degree ki 
from set p{ki) using ii hyperedges of rank r. Hence, for r = 3, Q2{ki = 3, = 2) = 3 (Fig. [2] 
left panel) and Q2{ki = 3, £j = 3) = 1 (Fig. |2]top right panel). A second example is presented 
in Fig. |2] (bottom right panel) for p{ki) = {a, b, c, d}, producing Q2{ki = 4, ii = 2) = 3. The 
sub-index r — 1 in Q comes from the fact that each hyperedge connected to i also connects 
to r — 1 other nodes which form an r — 1-hyperedge with each other; for r = 3, as in Fig|2l 
these r — 1-hyperedge are simply edges between nodes, such as (a, b), (a, c), or (6, c). 

Applying the ideas of the previous paragraph, one can determine that the contribution 
to ipi{ki,p) from a specific set p{ki) of nodes and number of hyperedges ii is given by 
Qr-i{ki, ii)p^{l —p)\r-i)~ where \^.Zi) comes from the size of Tj(A^, r). Note also that ii 
must satisfy some constraints for given ki. in order to be able to visit ki nodes, the smallest 
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number of hyperedges necessary is \ki/{r — 1)] < £j, where [.] is the ceihng function; also, 
there are (^^' J ways to choose node groups of size r — 1 out of ki nodes, and thus ti < ■ 
Therefore, conditional on p(fci), (t)i{ki,p\p{ki)) = Xl^Jrfc./fr-Dl Qr-i{kiJi)p^'{l - p)'^--^>~'^\ 



The final step is to note that there are ( ^/^) ways to select p{ki), leading to [l5| 



(r^-l) 

^ ' ^ L=\kd{T~X)\ 



V 



(6) 



=rfc./(r-i)i 

Figure [3] shows examples of ipiikijP) from analytics and simulations, for the general case 
("intermediate" p), and r = 3,4; Fig. H] does the same for the sparse and dense cases (small 
and large p). 

In the sparse case, close to the percolation threshold of the hypergraphs, large fluctuations 
appear in the distribution at relatively small ki. This behavior emerges because, at small 
p, the likelihood that hyperedges share multiple nodes (node overlaps) is low, which occurs 
when ki is not a multiple of r — 1. To explain this, consider the low density regime when 
p ~ apc with a a constant of order > 1 (a = 1 is the percolation threshold as derived in 
Ref . (id] , with pc = (^N ( ^) j ) . In this regime ^ijjiiki^p) can in fact be well approximated 
by using only £ = \ki/{r — 1)] , i.e. 



ipi{ki,p) 



^ ^^Qr-i(^ki, -^^^ ^ p\ [I - p)(!t-l) \r-A- [p^apc 



(7) 



The direct calculation of Qr-i{ki, \ki/{r — 1)]) is addressed in Sec. IIVBI To determine 
whether ki is a multiple of r — 1, one can introduce g = mod{ki, r — 1), where < < r — 2. 
If (7 = 0, ki is a multiple of r — 1. On the other hand, when g ^ 0, there are r — 1 — g 
node overlaps. For very large A^, pc ~ a(r — 2)\/N^^^, which together with Stirling's 
approximation and Q{ki, \ki/{r — 1)]) from Sec. IIVBI lead to 'ipi{ki,p) — )■ 'ipi{ki,p,g) in the 
sparse limit 



iP,{k„p,g)^N3-^'~''^ 



g-«/(r-l) 



a 



lk,/{T-l)^ 



ki 



p 



air - 2) 



r-l 



(8) 



where 



r — 1 



1; 



9 = 
g = r-2 



(9) 



^1 {r, fel) 
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Also, for the purposes of these approximations, one takes 
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= < 



kj+l . 

r-l ' 
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r-2 



(10) 



r 



- 1 



kj+r-l—g . 
r-l ' 
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1, 



which give the correct value of \ki/{r — 1)] for the specific g listed. Equation is quite 
informative. When g = 0, the degree distribution is strictly poisson, but when g 0, 
an asymptotic attenuation factor of the form A^9~(''^i) appears, which indicates that the 
probability to observe a single node overlap (1 = r — 1 — g = r — 1 — (r — 2)) is reduced by a 
factor, a 2 node overlap {2 = r — 1 — g = r — 1 — {r — 3)) by l/N"^, etc. The qualitative 
relevance of this result is that approximations of hypergraphs that consider the hyperedges 
as non-overlapping when projected onto a graph incur an error of order A^~^ in ipi{ki,p) in 
the sparse limit. In Fig. Hl^a), (b) and (c), the actual distribution (as given by Eq. IQ) is 
plotted against simulations, and the curves of Eq. ([8]) are superimposed for confirmation; 
one plot is performed on a system size much larger than those available for Monte Carlo 
simulation, but shows the best adherence to asymptotics. The case = is an envelope for 
the distributions when N oo. 

In the dense limit, when overlaps are ignored the error made is much greater because in 
this case the number of neighbors is matched to the number of hyperedges a node connects 
to, which can be as large as {'^Zi)- In reality, k^ can be at most equal to — 1. With the 
results in Sec. IIIIB4I and, in particular, the realization that for large ii, Qr-i{ki,ii) can be 
approximated as {^''^^^) (see Fig. [6](b)), the simple approximation 



for finite and relatively large p becomes satisfactory. This can be obtained by algebraic 
manipulation and the use of the gaussian approximation for the summand of Eq. ([6]). Note 
that the limit p — 1 is correctly obtained: for all ki < N — 1, the exponent of 1 — p is 
positive, and as p approaches 1, ipi{ki,p) — )■ 0; only ki = N — 1 makes the exponent of 1 — p 



examples of the dense estimate Eq. ( fTTl) against Eq. ([6]), which agree well with each other. 

To fully specify Eq. ([6]), it is necessary to determine Qr-i{ki, ii). In order to achieve this, 
it is important to develop some intuition about the meaning of Qr-i{ki, ii). The case r = 3 is 




[large p] 



(11) 



equal to zero, producing the result tpiiki = N — l,p = 1) ^ (jv-i) ~ !• Figure m^d) shows 



9 



very useful. Each hyperedge (in this case a triplet) connects i to two other nodes taken out of 
p{ki), and clearly all nodes in p(fci) are visited at least once so that the degree is equal to fcj. 
On any two nodes of p{ki), say a and b, the 3-hyperedge that connects them to i acts as an 
edge between a and b. Given that there are ii hyperedges available to achieve ki, determining 
Q2{ki,ii) is equivalent to enumerating all distinct labelled graphs of k^ nodes and ii edges, 
in which all nodes have degree at least one; there are no isolated nodes. Henceforth, I refer 
to these graphs as conditioned graphs. In the examples in Fig. [2l the cases contributing to 
ki = 3 and ii = 2 are: i) (a, 6) and (&, c), ii) {a,b) and (a,c), and iii) (a, c) and (&, c), and 
to A;j = 3 and £j = 3 is (a, b), (a, c) and (6, c). When the problem is generalized, Qr-i{ki, ii) 
corresponds to the number of distinct labelled hypergraphs with kj nodes and ii hyperedges 



of rank r — 1 such that all nodes belong to at least one hyperedge [22|. In the next section, the 
calculation of Q2{ki,ii) is tackled through different techniques, leading to the two formulas 
(Eqns. (ITTI) and (!35|l where the first one is valid for all r). 

To conclude this section, I determine (ki) using P(cr) (later on, this calculation is repeated 
using ipi^kijp) and identities relevant to Qr-i{,ki, ii)). By definition 

N N 

{h)= Yl kMP{^)= E E dMcT))P{a)= E E 0io^A^))Pi^)■ 
<Te{(T}conf cre{(T}conf i=i;jy* i=i;jV* o-e{cr}eonf 

(12) 

Concentrating on the sum over cr 

E e{o,,{a))p{cT) = E E p(^) = ^- E ^('^)' (13) 

'TG{'T}conf <Te{(T}conf O^^ ((t) = Oij{cr)=0 

where one uses the realization that 6{oij{cr)) = 1 in all hypergraphs where Oij > 1, and 
zero if Oij = 0. To determine the last sum, one uses the independence of the hyperedges in 
Eq. ([2]), and therefore 

1 1 

P{cr) = E P^^-'^il-p)^-"^-''^ ■■■ E p''^-r + i--N(^l_py~<^N-r + l,....N 

Oij{cr)=0 o"i,...,r=0 a-]v-r+i,...,]V=0 

N~2\ 



= {l-p){r-2) (14) 

because crjj,...,i^ = if both i and j are among the indices, factoring out of the sums over 
other o"j^,...,i^ (which are all equal to 1). Since this result is independent of j, 

N 

(ki) = E [l " (1 -p)(-') = (A^ - 1) 1 - (1 -p)(-2) . (15) 
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Higher moments can also be calculated this way, but they introduce couplings among indices, 
and the previous approach becomes much harder. In Sec. IIV Al a more powerful approach 
is developed making more straightforward the calculation of higher moments. Note that the 
low density regime p = apc corresponds to (fcj) ~ a. 



III. CALCULATIONS OF 

To determine Qr-i{ki, ii), I proceed by focusing on the enumeration of the conditioned 
graphs/hypergraphs mentioned above. To avoid confusion, it is very important to emphasize 
that the graphs and hypergraphs considered in this section are not those in {crjconf, but 
instead are tools to determine Qr-i and, if desired, can be interepreted directly in the context 
0/ {crjconf {22], ^ut it is not necessary. The nodes are labelled, consistent with the selection 
of sets p{ki). In the calculations of this section, given a choice p{ki) with ki nodes and ii 
hyperedges of rank r — 1, the node i is irrelevant and therefore the subindex i is dropped. 



A. Inclusion-Exclusion formula 



The combinatorial coefficient Qr-i{k,i) can be determined via the inclusion-exclusion 
principle of combinatorics [23]. The idea behind this principle is to count the number of 
elements in a set that satisfies certain conditions through a series of alternative overcounts 
and undercounts. Focusing on Q2{k,£) as the enumeration of conditioned graphs, a simple 
overcount of the conditioned graphs is {^l^) , the number of graphs with k nodes and i edges, 
where there are (2) places to locate i edges. This overcounts Q2{k,t) because it ignores the 
condition of all nodes being connected to at least one edge. If the configurations in which at 
least one node is not connected are taken away from the previous enumeration, the correct 
result is obtained. To approach this, one makes a first correction by taking away (^^j^) i!^ \ 
which counts all choices of A; — 1 nodes picked out of k multiplied by the number graphs 
formed with k — 1 nodes and ^ edges. This step has now eliminated all configurations that 
have nodes disconnected, but has eliminated multiple times all graphs in which two or more 
nodes are not connected to an edge. To correct for this, it is necessary to add (^^2) \ 
Once again there are unwanted graphs in this count which require further correction. It is 
straightforward to continue this until the point when the choice of m nodes chosen out of k 
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is small enough that (^) < at which point the sequence stops. These considerations lead 
to the expression 

Q,(A:,£) = X:(-ir-(^)(^P). (16) 
The extension to arbitrary r is direct, producing 

a-i(A:,£) = X:(-ir-(^)(^7^). (17) 

m=0 V / \ / 

In terms of direct calculation, this formula is useful in producing a numerical result, but it 
is not so easy to interpret on the basis of k and i, and some calculations that depend on it 
become difficult due to the alternating signs (e.g. asymptotics) . 

B. Assembly of Q2{k,£) 

An alternative to inclusion-exclusion is that of graph assembly. In this section, I explain 
how to compute Qr-i{k, i) with r = 3 through this method. The extension to arbitrary r is 
explained in Sec. IIII CI and though it is straightforward, it is admittedly cumbersome. The 
picture developed here is more intuitive than inclusion-exclusion, and opens the possibility 
to study the properties of Qr-i{k,i) further. To develop the procedure to count assemblies 
leading to the conditioned graphs, small examples are presented where i is close to its 
minimum possible value for given k. These examples exhibit all the aspects necessary to 
deal with the general Q2{k,i) case, which is studied in Sec. IIIIB4[ 

1. Preliminaries and simple examples. Types of edges 

In order to determine Q2{k,i) via assembly, one begins with k isolated nodes and adds 
edges, totalling i, so that every node is connected to an edge. To find all distinct graphs 
that contribute to Q2{k,i), one first needs to determine all possible ways to assemble those 
graphs. The number of distinct assemblies is larger than Q2{k,i), but is trivially corrected 
to yield Q2{k,£), as explained below. For the assembly process, the critical ingredient is 
knowledge of the number of distinct ways in which a given edge can enter into the graph. 
I now proceed to describe this enumeration (refer throughout this section to Fig. O for a 
specific example of assembly, along with the relevant notation). 
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Consider the initial state of k isolated nodes. At this initial step of the process, there 
are (2) possible pairs of nodes in which an edge can be placed. After the first step of edge 

— * 

addition, 2 nodes become used (or discovered). Let us define the vector ^ which characterizes 
the edge addition process. This vector has length £ (i.e. its dimension dim,^ = i). The r-th 
component of the vector, is the number of nodes that are discovered by the addition of 
the T-th edge in the assembly; for the first step, ^,-=1 = 2. Another useful definition is Ur, 
the number of nodes discovered up to step r. In all assemblies, ^,-=1 = 2 and Ur=i — 2. 
After the first edge is added, there are in total (2) distinct graphs. 

Each additional added edge generates an enumeration depending on the nodes that are 
involved in that edge. To illustrate this, consider the possibilities when adding the second 
edge, i.e. r = 2. The first possibility corresponds to edge r = 2 being used to discover 
two new nodes out of the remaining k — 2 undiscovered nodes, among which there are 
C^"^) possible node pairs. This leads to a total of (2) (^^2^) distinct graph assemblies, where 

— * 

Ut-=2 — 4. In ^, component ^r=2 — 2 because the second edge discovers 2 new nodes. The 
second possibility for r = 2 corresponds to adding an edge that connects one of the two 
nodes already in use to one of the k — 2 undiscovered nodes. This leads to (^){2{k — 2)) 
distinct graph assemblies because the second edge has 2 choices among discovered nodes and 
k — 2 choices among undiscovered nodes; Ur=2 — 3 and ^,-=2 — 1. 

When adding edges, if they find any previously disconnected nodes, then they contribute 
to visiting all k nodes. It is convenient to introduce notation for these edges. If the addition 
of an edge at a given step r discovers two unused nodes, this edge is counted into £2 and 
is described as being a type £2 edge. On the other hand, if at r an edge connects a node 
already discovered in a step < r to an undiscovered node, it counts into £1 and is referred to 
as a type £1 edge. For an arbitrary step r in the assembly, type £2 edges are associated with 
a factor {'^~^2 ~^) enumeration because they connect 2 out of the remaining k — u^-^i 

undiscovered nodes; type £1 edges are associated with a factor Ur-i{k — Ur-i) because they 
connect one of the Ur-i discovered nodes to one of the k — Ur-i undiscovered nodes. The 
counts £1 and (.2 arc part of the total number of edges £. Another kind of edge is possible, 
which connects two nodes already discovered; these edges are counted by £q and referred to 
as type £q edges (the refers to the fact that their introduction does not contribute to k 
because they do not discover new nodes). Below, I will give examples of the enumeration 
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for type £o edges. The relation between ^2,^1,^0 and k,£ is summarized in the equations 

£ = 4 + ^1 + 4 (18) 
A; = 2i2 + ii, (19) 

where only integer non-negative solutions are allowed. 

At this point, it is useful to make a few simple calculations that illustrate the ideas just 
described. First, consider k even, and let us assemble a conditioned graph with the minimum 
number of edges possible. Clearly, i — k/2, where each edge must connect a new pair of 
undiscovered nodes until all nodes are discovered, and therefore £ — £2 = k/2. Hence, there 
Y['l=i ('^^^^2"^^-') distinct assemblies and 

k /2 

«'=.<=*/2)=(pnC"t"")=^47i^ [*evanl (20) 

distinct conditioned graphs. The {k/2)\ in the denominator comes from the fact that the 
order in which the k/2 edges are chosen is irrelevant to the assembled graph, and thus their 

— * — * 

permutations must be taken away. In ^ notation, ^ = (2, . . . , 2), i.e., — 2 for all r, and 

— * — * 

dim^ — £2- In this example, ^ is unique. 

The next example to consider is when k is an odd number and £ is minimal {£ — {k + l)/2 
in this case). To assemble such conditioned graphs, any one of the k nodes must be reused 
exactly once to achieve the condition of all nodes being connected to at least one edge, and 
thus ii = 1 and £2 = {k — l)/2. As before, one chooses the first edge out of (2) possibilities, 
and from t — 2 and beyond the possibility to add the single edge of type £1 is available. If 
this edge is added in step t — b, and summing over all possible values of b, the enumeration 
becomes 

Q2{k,£^{k + l)/2)^ 

^Ep(^-i))(*-2(^-i))n(* <r n ('-'^x ) 

\ 2 )■ b=2 ai=l ^ ^ a2=t+l ^ ^ 

^ 2(^-i)/^"(^)! ^^^^ 

because there are (2) (^^2^) ■ ■ ■ (^"^2 ~^'*) ways to assemble the first 2(6 — 1) nodes using type 
£2 edges, (2(6 — l)){k — 2(6 — 1)) possible ways to introduce the type £1 edge, and after that, 
there are still k — 2(6 — 1) — 1 remaining nodes that are connected in ('=-2(^-1)-!^ • • • (2) 
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possible ways (/c — 2(6 — 1) — 1 is even). The denominator (^)! corrects for order in the 

— * — * 

permutations of the edge addition. In ^ notation, each value of b is associated with a ^ in 
which ^r=b = 1 and ^r^b = 2, and dim^ = {k + l)/2. In this example, unlike before, { is not 
unique; there are {k — l)/2 different ^, one for each choice of b between 2 and {k + l)/2. 

For the last example, consider k even and £ = k/2 + 1 (no longer minimal). In this 
situation, one can either: i) connect all pairs of nodes by use of £2 = k/2 edges while at 
some step t — c use a single type £0 edge to connect two nodes of the Ur-i=c-i that have 
already been discovered at r — 1 = c — 1, or ii) connect k — 2 nodes via £2 — {k — 2)/2 edges, 
and also use £1 = 2 edges at steps r — bi and r = 62 to connect the remaining 2 nodes; the 
two cases are mutually exclusive. Therefore, considering all possible 61,62,0, 

Q2 (k,£=^ + l 



2 J (1 + 1)! 

' k/2 fe/2+1 

5^(2(61 -i))(^- 2(61-1)) (2(62 - i)-i)(^- 2(62 -i) + i) 

61=2 62=61+1 

'l^Vfc - 2(ai - m '1^' /A; - 2(a2 - 1) + 1\ 'g' a - 2{as - 1) + 2 

ai=l \ ^ / 02=61+1 V 2 y 03=62+1 ^ ^ 



c=2 ^ ^ ^ / ai=2 ^ 



A;! f3k + A\ n . ^ 

[k even]. (22) 



2fc/2(|_2)!V 6 



The first set of sums in the square brackets enumerate the cases of two separate instances of 
visiting one used node {£2 — {k — 2)/2,£i — 2,£q — 0), and the second sum enumerates the 
cases when one edge is placed between two previously used nodes {£2 = k/2,£i = 0, £0 = !)• 
For the second sum, note that any type £q edge placed between two nodes already present 
occurs when 4 or 6 or ... k nodes have been used for the first time. At each of these steps, the 
number of choices is (2) — 2, (2) — 3, . . . , (2) — /c/2, which account for the number of possible 
edges between the nodes present minus the edges that have been placed. Generally, for a type 
£q edge introduced at step r, the factor associated with its enumeration is ("'^g ~ (''' ~ -'^)- 
Note that for such an edge Ur-i — Ur- Once again, the prefactor accounts for eliminating 
the permutations among overall edge placement order. In ^ notation, there are now two 
kinds of vectors: for £2 = {k — 2)/2, £1 = 2,£q = 0, there are ('■'^ 2''^^) distinct ^, one for each 
Ct=6i = 1,Ct=62 = 1,Ct7^6i,62 = 2; for £2 = k/2,£i = 0,4 = 1 there are {k - 2)/2 vectors ^, 
one for each case of ^r=c — 0, ^t^c — 2 
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It is clear that in all examples above, one can use a shorthand to represent the sums for 
the assemblies by using ^. Thus, Q2{k,i) = J2^^iO where C{C,) is the combinatorial 

factor associated with an assembly history ^, and the ^ are chosen to satisfy the given k and 

e. 

2. Setup of the Q2{k,£) calculation 

The three calculations above exhibit all types of edges in the assembly process: edges 
that visit two new nodes, edges that visit one new node and a previously visited node, 
and edges that visit two already visited nodes. Clearly, the kinds and numbers of edges 
used are constrained to satisfy the definition of Q2{k,i) as explained below. The function 
that each edge performs (type £2,^1 or £0) depends on the step at which it is added, which 
is equivalent to assuming that edges are distinguishable. The advantage of making this 
distinguishability available is that it converts the counting of Q2{k,i) into a process that is 
tractable, i.e., it provides rules to count all possibilities. However, if one looks at the final 
product of the assembly, the relevant conditioned graphs of Q2{k,i), it would be impossible 
to determine which edge came first or what function it performed (this is the reason why one 
divides X]fC'(^) by £!). Essentially, Q2{k,£) is calculated by first enumerating all possible 
assemblies that lead to the conditioned graphs, and then taking away the edge permutations. 

As it was shown in Eq. (12^ . there are multiple choices of io,ii,i2 for given k and i. 
Given that in Q2{k,i), k and i are specified, it is necessary to express the conditions on 
£o;^i;^2 as functions of k and i. But one cannot solve for all three io,ii,i2 from Eqs. f lT8]) 
and (fT9|) . However, it is possible to solve for £1, £2 by focusing on £ — £q and k. The solution 
is £2 = k — {i — £q) and ii = 2{£ — £q) — k. By taking £q as a free parameter, and running 
over all its possible values, all triplets £0,^1,^2 are uniquely specified. All that remains is 
to determine the allowed range for £q which emerges from determining the minimum and 
maximum £1 + £2 {= i — £0) necessary to visit k nodes, while keeping in mind that i'2 > 1 
since the first edge is always type £2- the minimum occurs when £2 = [|J and £1 = A; -2 [|J 
(which gives £1 = or 1) so £1 + £2 = [f] ; the maximum occurs for £2 = \ and £\ = k — 2 
with £1 + £2 = A; - 1. Therefore, [|] < + £2 < A; - 1 leading to £ - [|] >£^>£-{k-\). 
For each unique triplet £o;^i;^2, one can define the number of conditioned graph assemblies 
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Fie^JiJo), and 

Q2{k,e)= Yl Fi^2,iuio)= Yl H ^(^l' (23) 

eo=e-ik-i) eo=e-(k-i) |'e{^(^2,^i/o)}conf 

where {^(-^2, -^i, •^o)}conf = {^}conf corresponds to the set of all allowed histories ^ consistent 
with £o,£i,£2- Each has the form 

F(a = m-'Cii) = (^!)-^n/.K-i,e.) (24) 

T=l 

where friuT-iTCr) corresponds to the combinatorial assembly factor associated with the 
addition of the edge of type at step r, at which point the number of discovered nodes is 

Ur-i. As stated before, fr{ur-i,^T = 2) = (''"2""')' fr{.Ur-i,ir = 1) = Uu^^iik - Ur-i), and 
fr{ur-ii = 0) = ("^2"^) ~ (t — !)• The number of used nodes up to step r is given by 

r-l 

^ir-l = J]er', (25) 
r'=l 

which completes the calculation. 

However, given that calculating Q2{k,i) involves summing over all possible further 
specification is possible with more concrete results. Below, the calculation of F (^2, ^1,^0) is 
tackled in steps by first addressing F{£2,£i,£o — 0) and then using this result to introduce 
io edges and complete the calculation of Q2{k,£). 

3. Calculation of F{£2 , £1 , 4 = 0) 

When £q — 0, only the combinatorics of £1 and £2 edges are needed. It is useful to 

— * — * 

introduce the redefinition (^, r) — > (/i, t) in this case (the reason becomes clear in the next 

— * 

Sec). In this notation dim/i — £2 -\- £1, and each component ht can only be 1 or 2. The 

— * — * 

difference between two histories h and h' with equal £1 and £2 is the specific steps t in which 
ht = 1, i.e., the steps in which the type ii edges are introduced. It is then convenient to define 
a set {bi, . . . corresponding to the steps t of the first, second, ii introductions of 
type £1 edges, and a counter A from to £1. For a concrete h, for A = 1 there is an associated 
step t — bi. The enumeration of type £2 edges up to i = 61 — 1 is nai=i {^~'^^'^^~^^^ and at 
t — bi it is 2(61 — 1)(A; — 2{bi — 1)). Between the A — 1 and A edges of type £1, that is steps 
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bx-i + I < t < bx — 1, enumeration of type £2 edges is naA=lA-i+i C'~'^^"'''~^^^^'*'~^^) , and at 
t = 6a it is (2(6a — 1) — (A — l)){k — 2{b\ — 1) + (A — 1)). These considerations lead to the 
expression 

^ ' bi=2 ai = l ^ 



Y, [2(K-1)- (A -1)1 [1.-2(6,-1) + (A -1)1 n + 

'x:' i2(^-i)-w-i)i[*-2(6,.-i)+(«.-i)i n' (*-2K-i)+w-i)) 

n (26) 

where the sums in Eq. f l26|) reflect all possible ways to choose the set of 6a- 

Equation ( 126|) can be evaluated by noting that the factors due to £2 edges together with 
the factors of form [k — 2{b\ — 1) + (A — 1)] within the type ii enumeration combine to the 
factorial k\ = (2^2 + ^i)!- The denominators coming from the £2 factors produce 2^^. What 
remains is the sum of products of the form [2(6a — 1) — (A — 1)] which come from type £1 
edges, and counts the ways to pick nodes from those that have been discovered in steps 
previous to t = 6a, for all \ < £1. Hence, one can write 

i^^°H^2,£i) = ^,;(^^^^i(^2,^i) (27) 

where 

^2 + 1 ^2+2 ^2+^1 

A{£2,£^)^Y. £ E [2(6i-l)][2(62-l)-l]...[2(6,,-l)-(£i-l)]. (28) 

bi=2b2=bi + l 6^^=6^^_i + l 

Equation (127|) states that the number of ways to assemble the k nodes when £9 = is 
proportional to the permutations of the nodes and the number of choices in which single 
previously used nodes can be picked (as £1 edges are introduced). In h notation, Ai{£2,£i) 
can be written as 

A,{£2,£i)= Mh)= n (29) 

h(^{h{i2/i)}coni he{h{e2/i)}coni be{t\ht=i} 
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where (with a shght abuse of notation) b is an element of {t\ht — 1}, the set of all steps in 

— * — * 

assembly h at which a type £i edge is added. In h notation, 

he{h}coii! he{h}conf 

To develop some intuition about F*^°)(£2, ^i), it is useful to make reference to a few examples: 
if ^2 = 1 and thus £i = k — 2, F^^\i2,ii) is the number of distinct realizations of invasion 
percolation without trapping, where the initial seed is an edge (of indistinguishable nodes). 
For £2 > 1, F^^\£2,£i) counts a forest of £2 of these invasion percolation trees (the trees 
never coalesce). 

4- Introducing £0 > and full Q2{k,£) 

To introduce an edge of type £0, there must be nodes already used and, in addition, 
pairs of them that have not been directly connected by another edge. These unconnected 
node pairs arc vacancies. The combinatorics of type £0 edges require counting the vacancies 
available as the conditioned graph assembly progresses. The availability of vacancies is 
restricted by the assembly sequence h. For instance, consider the first two steps of any 
assembly. After the first edge of type £2, the second edge can only be type £1 or £2, but not 
type £0 because there are no vacancies in the graph yet. At t — 3 the first type £q edges can 
be introduced since there are four vacancies if the second edge is type £2 or one vacancy if 
it is type £1. 

Edges of type £0 can be placed in any step t of the sequence h where there are available 
vacancies, and to obtain the full enumeration, all possible placings must be counted. For- 
tunately, even though placing a type £0 edge is conditional on the vacancies created by £2 
and £1, the opposite is not true, i.e., placings of £1 and £2 are unaffected by £0, and thus the 
results of F^^^ {£2, £1) can be used here. This is because the combinatorics of type ^1, ^2 edges 
only depend on the numbers of used and unused nodes, and type £0 edges have no effect on 
those. 

Following the previous description, it makes sense to introduce v — {vi,V2, ■ ■ ■ ,Vi^+i^), 
the vacancies available due to the addition of type £1 and £2 edges, at the respective steps 

— * — * 

t — 1,2, . . . ,£i + £2 of h (clearly v is a function of h). These are the vacancies where type 
£0 edges can be placed. To track type £0 edges, one defines n — {rii, 712, ^3, ... , n^^+^j); the 
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number of edges type io placed, respectively, immediately after t = 1, 2, 3, . . . , £i + £2 edges 
of type ii and £2 have been added (to be clear, at step t, an edge of type £2 or ii is added, 
leading to Vt, and before the next step t + 1, rit edges of type £0 are added). Both vi and 
Til are equal to zero because there are no vacancies created with the first edge addition and 
thus it is valid to omit them from n and v if desired. To determine the combinatorial weight 
of any particular sequence of £0 placings, edges can choose among the available vacancies: 



at step t = 2, there are V2 vacancies, and so < 722 < ^"2, which can be done in („2-n2)! 
ways (keeping in mind the edges are considered distinguishable); at t = 3, there are fs — n2 
vacancies, and < 713 < f 3 — n2, with combinatorial weight (173 — 'n.2)!/(f3 — 77-2 — ^3)!; etc. 
Therefore, the number of combinations for the sequences v and n are 



constrained to io = nt 

t=2 



(31) 



MnMh))= n , r 

For a given sequence w, all allowed n contribute to Q2{k,i), and therefore it is necessary 
to sum over all n subject to the condition in the brackets. Thus, to each term Ai{h), one 
multiplies the factor 

where the notation of the sum implies summing over all combinations of rit that satisfy the 
constraint n2 + ■ ■ ■ + ni-^^e^ = ^o- To fully specify the previous, and recalling Eq. ( l25l) . vt is 
given by 

Vt = (2') - (33) 
which has already been mentioned in the discussions of Eqs. ( l22l) and (12^ . 

These results can now be put together in a single expression. From Eqns. fl30|) and fl3T|) 

Fii2jiJo) = ^ ^ ^ ^, E -^i(^) E ^^(^))- (34) 

h n 

With the use of Eqn. ( l23l) and the relations between and £2,^1,^0, this translates into 
the final result 

Q2{k, £)= Yl - - ^0)' - ^0) - A;, 4) 

lo=i-(k-l) 

fc' '"^^^ 1 

= ^(^ E ^E^iWE^o(n,.l/.)). (35) 

£o=£-(fc-l) n 
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It is interesting to write down a few results for Q2{k,i) to gain some concrete intuition of 
how the numbers evolve as k and i change (see Table [T]). Evidently, since the sums over 
h and n span all possible cases, the effect of assembly histories is summed away, and it is 
sensible to define a combinatorial coefficient dependent only on k,i,iQ. Thus 

A{i2, hJo) = A{k - (£ - £o), 2(£ - io) - k, io) = Yl Mh) Yl M^, v{h)) 



(36) 

III, — > _ n ,/ I I 

ti=2 ij?j=tfj^-i+l [n2H \-ne-^+e^=£o] *=2 

where Vt is defined through Eqns. ( 125|) and ( !33|l . The author is not aware of any combina- 
torial identity that allows the previous expression to be reduced further. Clearly, using the 
inclusion-exclusion principle, the left and right hand sides of Eq. ( I35l) could be evaluated to 
write an alternating series for A, but this would defeat the purpose of having only additive 
terms. Multivariate asymptotics of the expressions inside the sums are in principle possible 



in the field of enumerative asymtotics 2J-|26| but techniques are not well suited yet for 



arbitrary dimension calculations in cases such as A. 

A straightforward characterization of Q2{k,i) is found in Fig. [6l where the plots show 
lnQ2(^,^) and \n{Q2{k, i) / {^'''^)) as functions of k and i. It is clear that to a large extent, 
Q2ik,i) — )■ {^'^^) for large enough £ with respect to k, but this behavior breaks down when 
i ~ I"!] . This limit behavior is also valid for general r. Results for Q2ik, ["1]) (and for 
general r as well), where i is at its minimum, are presented in Sec. IIVBI A full treatment of 



the asymptotics of Q2{k,i) is presented in Ref. |20|, |21|, and therefore will not be repeated 
here. 



C. Extension to Qr^i{k,i) 

The treatment above can be extended to arbitrary r. A conditioned hypergraph with i 
hyperedges, of uniform rank r — 1, where all k nodes are visited by at least one hyperedge, 
can be assembled via hyperedges that are differentiated in terms of the number of visited 
nodes. Each hyperedge can find 0, 1, 2, . . . , r — 1 nodes as it is placed, leading to the edge 
types counted by Eq, ii, . . . , ir-i- The inputs k and i satisfy 

i = eo + ii+i2 + ---+ 4-1 (37) 
k = £i + 24 + ■ ■ ■ + (r - 1)4-1. (38) 
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As explained for the case of Q2ik,i) in Sec. IIII B 2\ all possible solutions to the Eqns. ([3] 
and fl38l) need to be used in order to enumerate all possible conditioned hypergraphs that 
contribute to Qr-i{k,i). In the present case, it is less straightforward to determine the 
number of solutions to Eqns. f l57|) and fl55]) than in the r = 3 case. However, it only requires 
calling upon the definition of integer partitions to give an answer. 

Recall that Eq. (!38l) on its own (with the additional restrictions that all i 1 , . . . , £j- — 1 are 



non-negative integers) is in fact 



;he condition satisfied by integer partitions of k in which the 



largest part is at most r — 1 [23|, |2^. The number of integer partitions of x with maximum 
part y {x,y both integers), expressed here as p{x,y), has been well studied, and is known 
to satisfy certain asymptotic formulas and recurrence relations. To use this definition in the 
present case, a few details need to be dealt with because aside from Eq. f l55]) . both Eq. f l37|) 
and ir-i > 1 (first edge is always type ir-i) also need to satisfied. First, one can reduce 
Eq. (!37|) by subtracting from £ because the former hyperedge type has no effect on k. 
Then, eliminating £i between £ ~ io and k yields k — {£ — £o) = £2 + 2^3 + ■ ■ ■ + (r — 2)4-i- 
In this form, almost all restrictions have been absorbed, except for £r-i > 1. Making the 
change of variables £'^_^ = £r-i — 1, one can finally write the relation 

A; - (£ - £0) - (r - 2) = £2 + 2^3 + ■ ■ ■ + (r - 2)£'^_^. (39) 

Now the variables £2, ■ ■ ■ ,£r-2,£r-i only need to be non-negative integers. Therefore, the 
number of solutions is equal to p{k — {£ — £q) — {r — 2),r — 2), where r — 2 occurs because 
£'^_i has coefficient r — 2. Note that for r = 3, one obtains p{k — {£ — £q) — 1,1) = 1, 
i.e., the solutions are unique for given k,£,£o. The number of values that £0 can take for 
r = 3 is [k/2\. For arbitrary r, the number of values for £q is determined from the limits 
of £ — £0 = £1 + £2 + ■ ■ ■ + £r-i- The smallest value, called (£1 + ■ ■ ■ + ^^-i)!!!!!! occurs 
when £r-i = [k/{r — 1)J and there is a single additional hyperedge of type £m, where m = 
k — {r — l)[_k/{r ~ 1)J (if m = then exactly \_k/ (r — 1)J hyperedges are needed); altogether, 
(^1 + ■ ■ ■ + ^r-i)min = \k/{r — 1)] . On the other hand, (^i + ■ ■ ■ + ^r-i)max = k — {r — 2) due to 
£r-i = 1 and all other hyperedges finding one node at a time, i.e. £i = A; — (r — 1). Therefore, 
\k/{r - 1)] <£-£o<l + k-{r~l) which means £ - {k - r + 2) < £o < £ - \k/{r - 1)]. 
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With these considerations, the number of solutions to Eqns. (l37|l and (138|) is 

^-Lfe/(r-l)J 

J2 p{k-{i-io)~{r-2),r-2) 

£o=£-(fe-r+2) 

£— (A;— 2(r— 2)) £— [A;/(7 — 1)J 

= ^ p(fc_(£_4)_(r-2))+ ^ p(fc_(£_4)_(^_2),r-2) 

£o=£-(A;-r+2) £o=^-(fc-2(r-2))+l 

(40) 

where p{k — {i — io) — (r — 2)) is the number of integer partitions with no restriction. The 
second sum in the last equality occurs because the restriction of the largest number to be 
r — 2 begins to apply for > £ — k + 2(r — 2) + 1; if A; < 2(r — 1) this term drops out. For 
small r such as 3,4,5, these expressions can be studied exactly, by obtaining expressions for 
restricted p{x,y) from recurrence relations, and maybe using tables for unrestricted p{x). 
For instance, with the recurrence relation p{x, y) = p{x, y — 1) + p{x — y, y) and boundary 



conditions p{x, 0) = 0, = 1, and p{x, y > x) = p{x) |27|, one obtains p{x, 1) = 1 and 



p{x,2) = \x/2']. As r increases, asymptotics become necessary. Classic results are available 



in this area such as the Hardy-Ramanujan asymptotics p{x) ~ exp i7T^2x/3)/{AxV^) and 



the asymptotics of restricted partitions p{x,y) ~ x^~^/[y\{y — 1)!] [24|. 

To complete this section, I describe the combinatorics of the placing of hyperedges in 
the assembly process that leads to Qr-i{k,i). In the general hyperedge of type 

irn (with 1 < m < r — 1) chooses m unused nodes and r — 1 — m used nodes. At any 
given step r of the assembly, there are Wr-i nodes that have been used, and k — u^_i that 
are yet to be used. The hyperedge at step r has a combinatorial factor (^""^1^^) „ ■ 
Type io hyperedges are added in the vacancies that other hyperedges provide, and their 
combinatorics are no different qualitatively than in the case r = 3: for Ur-i used nodes, 
there are Vr = {^^-i) " i'^ ~ ^) vacancies, and this leads again to Eq. ( 132|) with the updated 
relation for Vr- The combinatorial contribution of each assembly history ^ is given by Eq. 021]) 
with 

f ( [1< < r - 11 

1 fc)-(r-l); [e. = 0], 

and Ur = J2l'=i^r- Although it is possible to write down the expression for (5r~i(^,^), its 

cumbersome nature would not add much new intuition. 
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IV. USEFUL RESULTS CONCERNING Qr-i{kJ) 



A. Some identities of Qr-i{k,£), normalization of ijji{ki,p), and moments (kf) 



The calculation of (/cj) for arbitrary r boils down to 



iV-l 



ki=0 



N-1 



ki=0 ^ * ^ £,= pfc^/(^_i)] 

This calculation requires solving the sum ki(^~^)Qr-i{ki, ii) for all allowed values of ki. 
This evaluation can be done by reinserting the inclusion-exclusion expression for Qr-i and 
using a generating function approach on the key sum. To be specific, 



kj ( ^ )Qr-l(^i)' 



m=0 



where again i is dropped when appropriate because it is irrelevant for these identities. One 
can then show, using generating functions (below), that 

'k\ fN-V 
k 



-_ (_i)^-i(iv_i) 



Om,N-2 + Om,N-l 

m J \ m 



l)^-l(iV_l) [Sm,N-2 + Sm,N-l], 



(44) 



leading to 



N-1 



iV- 1 



m,W-2 + 1 I ^m,N-l 

m 



(A^-1)(^7^)-(A^-1)(^7.^ (45) 
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Therefore, {ki) becomes 



4=0 



(JV-1) 



(JV-l) 



(It) 



(N-l) l-(l-p)U-2j ^ (46) 



iV-2\ , tN-2\ 



where (^r/) = (^_-^) + (^J^) has been used in (1 = (1 - 



To show Eq. flU]) . let us define 



A; \ /A^ - 1 



(47) 
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and associate to it the generating function [23 

m m I- \ / \ 



A; \ /AT - 1 
k 



B-)H"r)5:(:) 



AT- 1 
A; 



(1 + z)^ (48) 



Noting that 



(i + '-)sE(-i)"('^;')(i+-)' = E(-i)'*( 



AT- 1 

A; 



(1 + ^)', 



and using 



E(-i)'( 



A^- 1 
k 



one obtains 



6^_i(.) = J](-l)'=fc(^^^ ^^(1 + 



d 



N-l 



(49) 



(50) 



zf = (1 + ^)^(_^)^-i = (AT - 1)(-1)^-^(1 + z)z''-\ 

(51) 



To obtain the m-th coefficient of h]^_i{z), one can apply 

'A^-2 



1 

' ml dz"^ ; 



(_l)^-i(iV- 1) 



2=0 



/AT - 1 
m / \ m 

-l)^'\N-l)[6^^r^_2 + 6m,N-i] (52) 



confirming Eq. ( 14^ . 
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Higher moments (/cf) can be calculated through a generalization of the previous result, 
namely 



A^- 1 
k 



-1) 



d 

dz dz J 



N-l 



. (53) 

ml dz"^ 

where the parenthesis to the power q is to be looked at as an operator that needs to be 
expanded for specific q. For instance, for q = 2, this identity leads to 

= (A^ - 1)2 -{N- 1){2N - 3)(1 - p)lV-2-J + (iv - l)(Ar - 2)(1 - p) 
The normalization of ilji{ki,p) can be confirmed by using 



N~2\ 
r-2 J 



(It) 



(54) 



k 



Qr-l{k,i) 



(55) 



which simply states that the number of ways in which to choose i distinct hyperedges of 
rank r — 1 out of a total of (^j/) possibilities is equal to the sum of taking k elements out of 
— 1 , weighted by the number of ways in which those k elements form i groups of size r — 1 
such that no element goes unused {Qr-i{k, i)). The expression can be shown algebraically 
via generating functions, in the same kind of approach as above. Also, it can be obtained 
by direct application of Eq. f lS^ with q = 0. 



B. Qr~.i{k,i = £) for minimum i = \k/{r — 1)] 



The case for r = 3 was derived in Eqns. fl20|) and f l2T]) . giving 

; k even 
; k odd. 



Q2{k 



fc! 



(56) 

Extending this result to general r is straightforward for the case when k is an exact 
multiple of r — 1, so that k = j{r — 1) with j an integer. In this case, each node is part of 
a single r — 1-clique, and no cliques overlap. The number j is the exact number of cliques 
needed to visit the k nodes. The first r — 1 nodes are chosen from k in (^^^) ways, the next 
nodes are chosen in ('^"J^^^^) , etc. After j steps, and recalling the need to compensate for 
the permutation of hyperedges (or cliques), one arrives at 



Q 



r-l 



k. 



k 



k 

r — 1 
1 



k — (r — 1] 
r — 1 
k\ 



r — 1 
r — 1 



(;^)![(r-l)f/(^-^) 



[k/{r — \) positive integer]. (57) 
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The more complicated case emerges when k — j{r — 1) + g, where both j and g are 
positive integers and 1 < g < r — 2, because it means that the k nodes have to be visited 
by a total of j + 1 hyperedges (£ minimum since j = [k/{r — This, however, allows 

considerable freedom. Let us enumerate the j + 1 steps involved in visiting the k nodes by 
the index t. For t = 1, exactly r — 1 nodes are visited. For t = 2, the second hyperedge can 
visit any number of nodes between 1 and r — 1. Let us define qt as the difference between 
r — 1 and the number of new nodes visited in step t. Note that = by definition. After 
t — j steps have occurred, one finds 

j j 

Y.^t = Y.qt^q. (58) 

i=l i=2 

At t = j, there are g + q unvisited nodes which must satisfy g<q + g<r— 1 (so the 
last hyperedge can visit the remaining unvisited nodes), leading toO<q<r — 1 — g. To 
make use of these facts, one must first calculate the combinatorial weight of a specific set of 
values for qt, and then sum over all the choices. The calculation hinges on determining the 
combinatorial weight of a single step t. At this step, (t — l)(r — 1) — J2V=2 nodes have 
already been visited, A; — (t — l)(r — 1) + Ylt~=2 'it' remain unvisited, and the t-th hyperedge 
visits r — 1 — qt new nodes. This leads to the combinatorial factor 

^ A - (^ - 1)(^ - 1) + Eti lA f{t - i)(r - 1) - Eji _ (gg^ 



r-l)-qt J \ qt 

The first of the two binomials counts the choices in picking unused nodes, and the second 
counts the choices of previously used nodes. After j steps, the unused nodes equal g + q, 
and the used nodes are k — g — q, and the last hyperedge must pick r — 1 — g — q from the 
later. Therefore, for given set {qt}i<t<j, the total number of choices is 

( k-g-q \ + ^ //c-(t-l)(r-l) + Et\gA ^/(t-l)(r-l)-Er=2?t' 
\r-l-g-qj\g + qj {r - I) - qt J f}^\ qt 

(60) 

Since 0<q<r — 1 — g, with g = k — {r — 1) = mod(A;, r — 1), and dividing by the 

permutations over edges, the total number of choices becomes 



Qr-i { k, 



k 



r-l-g 



t V V . 

frJL^l)! ^ ^ \-r-l-g-ql\g + q 



k-g-q \f9 + q 



t=l 
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Expansion of the binomials exposes a k\ in the numerator, but is also multiplied by a factor 
for all possible choices of visiting previously used nodes, leading to a combinatorial number 
qualitatively similar to Ai. One can rewrite the last expression slightly more compactly as 



Qr—l I k, 



k 



r-1 



( k \ r-l-g 

Kr-i) ir 



1 



E 

[92 H — hgj=g] 



k- g-q 



J- - l,r - I - q2, . . . ,r - 1 - qj 
where the multinomial notation 



k- g-q 
r - l,r - 1 - q2, . . . ,r - 1 - qj 



[k-g-qV- 

{r — l)\{r — 1 — q2y. . . . {r — 1 — qj)\ 



, (62) 



(63) 



has been used. 



V. DISCUSSION AND CONCLUSIONS 



In this article, I calculate the neighbor degree distribution for hypergraphs, or the equiv- 
alent problem of graphs that originate as one-mode projections of hypergraphs, giving a 
precise count of the number of unique node neighbors that a given node has on any of these 
cases. Detailed treatment is presented for uniform rank hypergraphs with homogeneous 
hyperedge probability. The relevant qualitative feature of this study is that node-overlaps 
are properly accounted for, so that no overcounting of neighbors occurs in the distribution. 
The sparse and dense hmit asymptotics are also presented. These asymptotics provide a 
way to determine the errors made in ignoring overlaps, which prove to be asymptotically 
small in the sparse limit, but fully dominant in the dense limit. To perform the calculation 
of the neighbor distribution, the quantity Qr-i is introduced and studied for the first time, 
and its exact formula is provided. It is worth mentioning that the assembly procedure to 
calculate Qr-i can be generalized to address the full problem of mixed rank hypergraphs 
or even bi-partite graphs (this will be addressed in a publication in preparation). It is the 
author's believe that this work will prove useful in the analysis of theoretical and empirical 
problems of systems in which multiway interactions play a dominant role, thus requiring 
hypergraph or bi-partite graph representations. 
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TABLE I: A few values of Q2ik,£) with i^am = |"|] and ^max = (2)- The notation Q2{, +x) = Q2{k,imin + x). 



Hypergraph: Projected network: 




FIG. 1: Illustration for the projection V{oij) = Oij from hypergraphs r = 3 to networks. On the 
left, the hypergraph is composed of hyperedges aa,c,b = 1 and aa^b,d = 1- The projected network 
(right) has a link between i and j of weight Wij if there are Wij hyperedges that contain both nodes 
i and j. In this example, a and b belong to two hyperedges, and thus Wa^b = 2; all other node pairs 
belong in a single hyperedge, and thus their weights are equal to 1. 
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= 3,^ , = 2 produced from (k^) = {a,h,c} 
(sets {a,c,d}, {a,fe,rf},and {b,cM} are similar) 



k^ = 3,/'; = 3 produced from p(i,) = {a,b,c} 
(sets {o,c,<i}, {a,h,d},and {b.c,d} are similar) 




FIG. 2: Illustration (r = 3) of the emergence of degree ki as a consequence of various possible 
hyperedge configurations. The figure also illustrates Qr^i{ki,ti). For p{ki) = {a,h,c\ (left and 
top right panels), node i can be connected to nodes a, 6, c in several ways. Hypergraphs <ti, (T2, 0-3 
exhibit the three ways in which = 2 hyperedges can produce the connection between i and all 
nodes of p{ki); cr^ represents the single possibility of £j = 3 to connected i to all nodes in p{ki). The 
successive dots represent other hyperedges not connected to i, and hence irrelevant to i. From this 
example, Qr-i=2{ki = 3,£i = 2) = 3 and Qr-i=2{ki = 3, = 3) = 1. Note that in all hypergraphs 
cri,cr2,<T3 one node overlaps in two hyperedges and in a 4^ all three nodes are overlapping in the 
appropriate pair of hyperedges. For p{ki) = {a, 6, c, d} the degree is ki = 4, and if £j = 2 as 
in the bottom right panel, one deduces that Qr~i=2{ki = 4,^j = 2) = 3 (the three hypergraphs 
(^5,(^6,(^7)- 
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FIG. 3: ijji{ki,p) vs. h calculated both from theory (curves) and simulations (symbols): (a) 
AT = 32, r = 3, and p « 0.02 (O), 0.05 (□) (b) iV = 32, r = 4, and p « 0.00089 (Q), 0.0024 (□). 
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c^-e Theory 

— g=0 

— g=r-2 
g=r-3 

A Simulations 





FIG. 4: ijji^ki^p) vs. ki in the sparse limit (close to percolation), and the sparse approximations 
■ipi{ki,p,g) given in Eq. ([8]): (a) N = 128, r = 3 and p is adjusted to (k) = 4, (b) iV = 64, r = 4 
and p is adjusted to (k) = 6, and (c) N = 2048, r = 4 and p is adjusted to {k) = 3.5. In all 
plots, the values from Eq. ([6|) are represented by (Q) connected with the thick continuous line, 
simulations by (A), and the approximations ipi{ki,p,g) by thin continuous line for g = 0, dashed 
line for g = r — 2, and dot-dashed line for g = r — 3. For small systems such as (a) and (b), it is 
better to use Eq. ([7|) for ipi^kijp, g), since the asymptotic limit is still not fully expressed. However, 
in (c) the size is large enough and displays good asymptotics, given by Eq. dS]). (d) ipi{ki,p) vs. ki 
in the dense limit for N = 64, r = 3 and p = 0.1, 0.2 from Eq. ([6]) (Q) and the dense approximation 
Eq. ()lip (D). As p increases the agreement improves. 
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I = (2,2,1,0,1); k = 6; 1 = 5; 1^=2, l,=2, i^=\ 




FIG. 5: Assembly of a conditioned graph contributing to Q2{k,i) for k = 6,i = 5. In this example, 
the assembly history is given by = (2,2,1,0,1), representing the fact that the edges added are, 
in order of appearance, of types ^25^25^i;^0;^i- At each assembly step r, the type of edge ^r, 
total number of used edges Ur, and combinatorial factor are given. The total combinatorics of 
assemblies with this same history ^ = (2, 2, 1, 0, 1) is given by Eq. (plj) . 
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(a) 




FIG. 6: (color online) (a) \nQ2{k,t) vs. k,i and (b) ln[Q2(fc, vs. k,i. From (b) it is clear 
that as £ increases for any /c, the ratio tends to 1 (with log going to 0). 
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